Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(sql): add a base SQLGlot backend for DuckDB and ClickHouse #7796

Merged
merged 2 commits into from
Dec 26, 2023

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Dec 18, 2023

Rewrite the DuckDB and ClickHouse backends using a new base SQLGlotBackend and associated compiler.

@kszucs kszucs changed the base branch from master to the-epic-split December 18, 2023 20:17
Copy link
Contributor

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

@cpcloud cpcloud force-pushed the tes-sql branch 2 times, most recently from 8a5c21f to 76f516a Compare December 22, 2023 09:26
@cpcloud cpcloud added refactor Issues or PRs related to refactoring the codebase tests Issues or PRs related to tests ci Continuous Integration issues or PRs clickhouse The ClickHouse backend duckdb The DuckDB backend labels Dec 22, 2023
@cpcloud cpcloud force-pushed the tes-sql branch 5 times, most recently from c00c30b to 03e86bd Compare December 23, 2023 16:18
ibis/backends/base/sql/alchemy/registry.py Show resolved Hide resolved
ibis/examples/tests/test_examples.py Show resolved Hide resolved
ibis/backends/base/sqlglot/compiler.py Outdated Show resolved Hide resolved
ibis/backends/base/sqlglot/compiler.py Show resolved Hide resolved
ibis/backends/base/sqlglot/compiler.py Show resolved Hide resolved
ibis/backends/base/sqlglot/datatypes.py Outdated Show resolved Hide resolved
ibis/backends/duckdb/tests/test_datatypes.py Show resolved Hide resolved
ibis/backends/duckdb/tests/test_datatypes.py Show resolved Hide resolved
@@ -30,7 +30,7 @@ def test_tpc_h01(lineitem):
avg_qty=t.l_quantity.mean(),
avg_price=t.l_extendedprice.mean(),
avg_disc=t.l_discount.mean(),
count_order=t.count(),
count_order=lambda t: t.count(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kszucs Still not 100% sure whether this breakage is something we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still failing? We can certainly fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, if I revert this change then the test fails with an integrity error.

ibis/backends/base/sqlglot/compiler.py Outdated Show resolved Hide resolved
ibis/backends/base/sqlglot/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/sqlglot/__init__.py Show resolved Hide resolved
ibis/backends/base/sqlglot/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/sqlglot/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/sqlglot/__init__.py Outdated Show resolved Hide resolved
it's alive!

tests run (and fail)

chore(duckdb): naive port of clickhouse compiler

fix(duckdb): hacky fix for output shape

feat(duckdb): bitwise ops (most of them)

feat(duckdb): handle pandas dtype mapping in execute

feat(duckdb): handle decimal types

feat(duckdb): add euler's number

test(duckdb): remove duckdb from alchemycon

feat(duckdb): get _most_ of string ops working

still some failures in re_exract

feat(duckdb): add hash

feat(duckdb): add CAST

feat(duckdb): add cot and strright

chore(duckdb): mark all the targets that still need attention (at least)

feat(duckdb): combine binary bitwise ops

chore(datestuff): some datetime ops

feat(duckdb): add levenshtein, use op.dtype instead of output_dtype

feat(duckdb): add blank list_schemas, use old current_database for now

feat(duckdb): basic interval ops

feat(duckdb): timestamp and temporal ops

feat(duckdb): use pyarrow for fetching execute results

feat(duckdb): handle interval casts, broken for columns

feat(duckdb): shove literal handling up top

feat(duckdb): more timestamp ops

feat(duckdb): back to pandas output in execute

feat(duckdb): timezone handling in cast

feat(duckdb): ms and us epoch timestamp support

chore(duckdb): misc cleanup

feat(duckdb): initial create table

feat(duckdb): add _from_url

feat(duckdb): add read_parquet

feat(duckdb): add persistent cache

fix(duckdb): actually insert data if present in create_table

feat(duckdb): use duckdb API read_parquet

feat(duckdb): add read_csv

This, frustratingly, cannot use the Python API for `read_csv` since that
does not support list of files, for some reason.

fix(duckdb): dont fully qualify the table names

chore(duckdb): cleanup

chore(duckdb): mark broken test broken

fix(duckdb): fix read_parquet so it works

feat(duckdb): add to_pyarrow, to_pyarrow_batches, sql()

feat(duckdb): null checking

feat(duckdb): translate uints

fix(duckdb): fix file outputs and torch output

fix(duckdb): add rest of integer types

fix(duckdb): ops.InValues

feat(duckdb): use sqlglot expressions (maybe a big mistake)

fix(duckdb): don't stringify strings

feat(duckdb): use sqlglot expr instead of strings for count

fix(duckdb): fix isin

fix(duckdb): fix some agg variance functions

fix(duckdb): for logical equals, use sqlglot not operator

fix(duckdb): struct not tuple for struct type
@cpcloud cpcloud force-pushed the tes-sql branch 2 times, most recently from 62eff1d to be26c52 Compare December 26, 2023 15:31
@cpcloud cpcloud changed the title refactor(sql): add a base SQLGlot backend based on the new relational operations and implementations for DuckDB and ClickHouse refactor(sql): add a base SQLGlot backend for DuckDB and ClickHouse Dec 26, 2023
Copy link
Member Author

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Great refactor!

@kszucs kszucs merged commit 17adb81 into ibis-project:the-epic-split Dec 26, 2023
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration issues or PRs clickhouse The ClickHouse backend duckdb The DuckDB backend refactor Issues or PRs related to refactoring the codebase tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants